IEEE INFOCOM 2024

Session E-8

E-8: Machine Learning 2

Conference

8:30 AM — 10:00 AM PDT

Local

May 23 Thu, 10:30 AM — 12:00 PM CDT

Location

Regency E

Deep Learning Models As Moving Targets To Counter Modulation Classification Attacks

Naureen Hoque and Hanif Rahbari (Rochester Institute of Technology, USA)

0

Malicious entities use advanced modulation classification (MC) techniques to launch traffic analysis, selective jamming, evasion, and poison attacks. Recent studies show that current defense mechanisms against such attacks are static in nature and vulnerable to persistent adversaries who invest time and resources into learning the defenses, thereby being able to design and execute more sophisticated attacks to circumvent them. In this paper, we present a moving-target defense framework to support a novel modulation-masking mechanism we develop against advanced and persistent modulation classification attacks. The modulated symbols are masked using small perturbations before transmission to make it appear as if from another modulation scheme. By deploying a pool of deep learning models and perturbation generating techniques, the defense strategy keeps changing (moving) them when needed, making it difficult for adversaries to keep up with the defense system's changes over time. We show that the overall system performance remains unaffected under our technique. We further demonstrate that our masking technique, in addition to other existing defenses, can be learned and circumvented over time by a persistent adversary unless a moving target defense approach is adopted.

Speaker

Speaker biography is not available.

Deep Learning-based Modulation Classification of Practical OFDM signals for Spectrum Sensing

Byungjun Kim (UCSD, USA); Peter Gerstoft (University of California, San Diego, USA); Christoph F Mecklenbräuker (TU Wien, Austria)

0

In this study, the modulation of symbols on OFDM subcarriers is classified for transmissions following Wi-Fi 6 and 5G downlink specifications. First, our approach estimates the OFDM symbol duration and cyclic prefix length based on the cyclic autocorrelation function. We propose a feature extraction algorithm characterizing the modulation of OFDM signals, which includes removing the effects of a synchronization error. The obtained feature is converted into a 2D histogram of phase and amplitude and this histogram is taken as input to a convolutional neural network (CNN)-based classifier. The classifier does not require prior knowledge of protocol-specific information such as Wi-Fi preamble or resource allocation of 5G physical channels. The classifier's performance, evaluated using synthetic and real-world measured over-the-air (OTA) datasets, achieved a minimum accuracy of 97% accuracy with OTA data when SNR is above the value required for data transmission.

Speaker

Speaker biography is not available.

Resource-aware Deployment of Dynamic DNNs over Multi-tiered Interconnected Systems

Chetna Singhal (Indian Institute of Technology Kharagpur, India); Yashuo Wu (University of California Irvine, USA); Francesco Malandrino (CNR-IEIIT, Italy); Marco Levorato (University of California, Irvine, USA); Carla Fabiana Chiasserini (Politecnico di Torino & CNIT, IEIIT-CNR, Italy)

0

The increasing pervasiveness of intelligent mobile applications requires to exploit the full range of resources offered by the mobile-edge-cloud network for the execution of inference tasks. However, due to the heterogeneity of such multi-tiered networks, it is essential to make the applications' demand amenable to the available resources while minimizing energy consumption. Modern dynamic deep neural networks (DNN) achieve this goal by designing multi-branched architectures where early exits enable sample-based adaptation of the model depth. In this paper, we tackle the problem of allocating sections of DNNs with early exits to the nodes of the mobile-edge-cloud system. By envisioning a 3-stage graph-modeling approach, we represent the possible options for splitting the DNN and deploying the DNN blocks on the multi-tiered network, embedding both the system constraints and the application requirements in a convenient and efficient way. Our framework - named Feasible Inference Graph (FIN) - can identify the solution that minimizes the overall inference energy consumption while enabling distributed inference over the multi-tiered network with the target quality and latency. Our results, obtained for DNNs with different levels of complexity, show that FIN matches the optimum and yields over 65% energy savings relative to a state-of-the-art technique for cost minimization.

Speaker Chetna Singhal

Chetna Singhal is working as Assistant Professor in Electronics and Communication Engineering department at IIT Kharagpur.

Jewel: Resource-Efficient Joint Packet and Flow Level Inference in Programmable Switches

Aristide Tanyi-Jong Akem (IMDEA Networks Institute, Spain & Universidad Carlos III de Madrid, Spain); Beyza Butun (Universidad Carlos III de Madrid & IMDEA Networks Institute, Spain); Michele Gucciardo and Marco Fiore (IMDEA Networks Institute, Spain)

0

Embedding machine learning (ML) models in programmable switches realizes the vision of high-throughput and low-latency inference at line rate. Recent works have made breakthroughs in embedding Random Forest (RF) models in switches for either packet-level inference or flow-level inference. The former relies on simple features from packet headers that are simple to implement but limit accuracy in challenging use cases; the latter exploits richer flow features to improve accuracy, but leaves early packets in each flow unclassified. We propose Jewel, an in-switch ML model based on a fully joint packet-and flow-level design, which takes the best of both worlds by classifying early flow packets individually and shifting to flow-level inference when possible. Our proposal involves (i) a single RF model trained to classify both packets and flows, and (ii) hardware-aware model selection and training techniques for resource footprint minimization. We implement Jewel in P4 and deploy it in a testbed with Intel Tofino switches, where we run extensive experiments with a variety of real-world use cases. Results reveal how our solution outperforms four state-of-the-art benchmarks, with accuracy gains in the 2.2%-5.3% range.

Speaker Beyza Bütün

Beyza Bütün is a Ph.D. student in the Networks Data Science Group at IMDEA Networks Institute in Madrid, Spain. She is part of the project ECOMOME, which aims to model and optimise the energy consumption of networks. She is also a Ph.D. student in the Department of Telematics Engineering at Universidad Carlos III de Madrid, Spain. She holds a bachelor's and master's degree in Computer Engineering from Middle East Technical University in Ankara, Turkey. During her master's, she worked on the optimal design of wireless data center networks. Beyza's current research interest is in-band network intelligence, distributed in-band programming, and energy consumption optimization in the data plane.

Session Chair

Marilia Curado (University of Coimbra, Portugal)

Enter Zoom

Session E-9

E-9: Machine Learning 3

Conference

10:30 AM — 12:00 PM PDT

Local

May 23 Thu, 12:30 PM — 2:00 PM CDT

Location

Regency E

Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules

Xinglin Pan (Hong Kong Baptist University, Hong Kong); Wenxiang Lin and Shaohuai Shi (Harbin Institute of Technology, Shenzhen, China); Xiaowen Chu (The Hong Kong University of Science and Technology (Guangzhou) & The Hong Kong University of Science and Technology, Hong Kong); Weinong Sun (The Hong Kong University of Science and Technology, Hong Kong); Bo Li (Hong Kong University of Science and Technology, Hong Kong)

0

Sparsely-activated Mixture-of-Expert (MoE) layers have found practical applications in enlarging the model size of large-scale foundation models, with only a sub-linear increase in computation demands. Despite the wide adoption of hybrid parallel paradigms like model parallelism, expert parallelism, and expert-sharding parallelism (i.e., MP+EP+ESP) to support MoE model training on GPU clusters, the training efficiency is hindered by communication costs introduced by these parallel paradigms. To address this limitation, we propose Parm, a system that accelerates MP+EP+ESP training by designing two dedicated schedules for placing communication tasks. The proposed schedules eliminate redundant computations and communications and enable overlaps between intra-node and inter-node communications, ultimately reducing the overall training time. As the two schedules are not mutually exclusive, we provide comprehensive theoretical analyses and derive an automatic and accurate solution to determine which schedule should be applied in different scenarios. Experimental results on an 8-GPU server and a 32-GPU cluster demonstrate that Parm outperforms the state-of-the-art MoE training system, DeepSpeed-MoE, achieving 1.13x-5.77x speedup on 1296 manually configured MoE layers and approximately 3x improvement on two real-world MoE models based on BERT and GPT-2.

Speaker

Speaker biography is not available.

Predicting Multi-Scale Information Diffusion via Minimal Substitution Neural Networks

Ranran Wang (University of Electronic Science and Technology of China, China); Yin Zhang (University of Electronic Science and Technology, China); Wenchao Wan and Xiong Li (University of Electronic Science and Technology of China, China); Min Chen (Huazhong University of Science and Technology, China)

0

Information diffusion prediction is a complex task due to the numerous variables present in large social platforms like Weibo and Twitter. While many researchers have focused on the internal influence of individual cascades, they often overlook other influential factors that affect information diffusion. These factors include competition and cooperation among information, the attractiveness of information to users, and the potential impact of content anticipation on further diffusion. Traditional methods relying on individual information modeling struggle to consider these aspects comprehensively. To address the above issues, we propose a multi-scale information diffusion prediction method with a minimal substitution neural network, called MIDPMS. Specifically, to simultaneously enable macro-scale popularity prediction and micro-scale diffusion prediction, we model information diffusion as a substitution process among different information sources. Furthermore, considering the life cycle of content, user preferences, and potential content anticipation, we introduce minimal substitution theory and design a minimal substitution neural network to model this substitution system and facilitate joint training of macroscopic and microscopic diffusion prediction. The extensive experiments on Weibo and Twitter datasets demonstrate MIDPMS significantly outperforms the state-of-art methods over two datasets on both multi-scale tasks.

Speaker Ranran Wang (University of Electronic Science and Technology of China, China)

Ranran Wang is currently a PhD candidate of the School of Information and Communication Engineering, University of Electronic Science and Technology of China. Her main research interests include edge intelligence, cognitive wireless communications, graph learning.

Online Resource Allocation for Edge Intelligence with Colocated Model Retraining and Inference

Huaiguang Cai (Sun Yat-Sen University, China); Zhi Zhou (Sun Yat-sen University, China); Qianyi Huang (Sun Yat-Sen University, China & Peng Cheng Laboratory, China)

0

Due to several kinds of drift, the traditional computing paradigm of deploying a trained model and then performing inference has not been able to meet the accuracy requirements. Accordingly, a new computing paradigm that, retraining the model and performing inference simultaneously on new data after the model is deployed, emerged (we call it model inference and retraining co-location). The key challenge is how to allocate computing resources for model retraining and inference to improve long-term accuracy, especially when computing resources are changing dynamically.
We address this challenge by modeling the relationship between model performance and different retraining and inference configurations first and then propose a linear complexity online algorithm (named \ouralg).
\ouralg solves the original non-convex, integer, time-coupled problem approximately by adjusting the proportion between model retraining and inference according to available real-time computing resources. The competitive ratio of \ouralg is strictly better than the tight competitive ratio of the Inference-Only algorithm (corresponding to the traditional computing paradigm) when data drift occurs for a sufficiently lengthy time, implying the advantages and applications of model inference and retraining co-location paradigm. In particular, \ouralg translates to several heuristic algorithms in different environments. Experiments based on real scenarios confirm the effectiveness of \ouralg.

Speaker

Speaker biography is not available.

Tomtit: Hierarchical Federated Fine-Tuning of Giant Models based on Autonomous Synchronization

Tianyu Qi and Yufeng Zhan (Beijing Institute of Technology, China); Peng Li (The University of Aizu, Japan); Yuanqing Xia (Beijing Institute of Technology, China)

0

With the quick evolution of giant models, the paradigm of pre-training models and then fine-tuning them for downstream tasks has become increasingly popular. The adapter has been recognized as an efficient fine-tuning technique and attracts much research attention. However, adapter-based fine-tuning still faces the challenge of lacking sufficient data. Federated fine-tuning has been recently proposed to fill this gap, but existing solutions suffer from a serious scalability issue, and they are inflexible in handling dynamic edge environments. In this paper, we propose Tomtit, a hierarchical federated fine-tuning system that can significantly accelerate fine-tuning and improve the energy efficiency of devices. Via extensive empirical study, we find that model synchronization schemes (i.e., when edges and devices should synchronize their models) play a critical role in federated fine-tuning. The core of Tomtit is a distributed design that allows each edge and device to have a unique synchronization scheme with respect to their heterogeneity in model structure, data distribution and computing capability. Furthermore, we provide a theoretical guarantee about the convergence of Tomtit. Finally, we develop a prototype of Tomtit and evaluate it on a testbed. Experimental results show that it can significantly outperform the state-of-the-art.

Speaker Tianyu Qi (Beijing Institute of Technology, China)

Tianyu Qi, received BS degree from China University of Geosciences, Wuhan, China, in 2021. He is currently pursuing the MS degree in the School of Automation at the Beijing Institute of Technology, Beijing, China. His research interests include federated learning, cloud computing, and machine learning.

Session Chair

Marco Fiore (IMDEA Networks Institute, Spain)

Enter Zoom

Session E-10

E-10: Machine Learning 4

Conference

1:30 PM — 3:00 PM PDT

Local

May 23 Thu, 3:30 PM — 5:00 PM CDT

Location

Conference

3:30 PM — 5:00 PM PDT

Local

May 23 Thu, 5:30 PM — 7:00 PM CDT

Location

Regency E

Taming Subnet-Drift in D2D-Enabled Fog Learning: A Hierarchical Gradient Tracking Approach

Evan Chen (Purdue University, USA); Shiqiang Wang (IBM T. J. Watson Research Center, USA); Christopher G. Brinton (Purdue University, USA)

0

Federated learning (FL) encounters scalability challenges when implemented over fog networks. Semi-decentralized FL (SD-FL) proposes a solution that divides model cooperation into two stages: at the lower stage, device-to-device (D2D) communications is employed for local model aggregations within subnetworks (subnets), while the upper stage handles device-server (DS) communications for global model aggregations. However, existing SD-FL schemes are based on gradient diversity assumptions that become performance bottlenecks as data distributions become more heterogeneous. In this work, we develop semi-decentralized gradient tracking (SD-GT), the first SD-FL methodology that removes the need for such assumptions by incorporating tracking terms into device updates for each communication layer. Analytical characterization of SD-GT reveals convergence upper bounds for both non-convex and strongly-convex problems, for a suitable choice of step size. We employ the resulting bounds in the development of a co-optimization algorithm for optimizing subnet sampling rates and D2D rounds according to a performance-efficiency trade-off. Our subsequent numerical evaluations demonstrate that SD-GT obtains substantial improvements in trained model quality and communication cost relative to baselines in SD-FL and gradient tracking on several datasets.

Speaker

Speaker biography is not available.

Towards Efficient Asynchronous Federated Learning in Heterogeneous Edge Environments

Yajie Zhou (Zhejiang University, China); Xiaoyi Pang (Wuhan University, China); Zhibo Wang and Jiahui Hu (Zhejiang University, China); Peng Sun (Hunan University, China); Kui Ren (Zhejiang University, China)

0

Federated learning (FL) is widely used in edge environments as a privacy-preserving collaborative learning paradigm. However, edge devices often have heterogeneous computation capabilities and data distributions, hampering the efficiency of co-training. Existing works develop staleness-aware semi-asynchronous FL that reduces the contribution of slow devices to the global model to mitigate their negative impacts. But this makes data on slow devices unable to be fully leveraged in global model updating, exacerbating the effects of data heterogeneity. In this paper, to cope with both system and data heterogeneity, we propose a clustering and two-stage aggregation-based Efficient Asynchronous Federated Learning (EAFL) framework, which can achieve better learning performance with higher efficiency in heterogeneous edge environments. In EAFL, we first propose a gradient similarity-based dynamic clustering mechanism to cluster devices with similar system and data characteristics together dynamically during the training process. Then, we develop a novel two-stage aggregation strategy consisting of staleness-aware semi-asynchronous intra-cluster aggregation and data size-aware synchronous inter-cluster aggregation to efficiently and comprehensively aggregate training updates across heterogeneous clusters. With that, the negative impacts of slow devices and Non-IID data can be simultaneously alleviated, thus achieving efficient collaborative learning. Extensive experiments demonstrate that EAFL is superior to state-of-the-art methods.

Speaker Yajie Zhou (Zhejiang University)

Yajie Zhou received the BS degree from Huazhong University of Science and Technology, China, in 2023. She is currently working toward the PhD degree with the School of Cyber Science and Technology, Zhejiang University. Her main research interests include edge intelligence and Internet of Things.

Personalized Prediction of Bounded-Rational Bargaining Behavior in Network Resource Sharing

Haoran Yu and Fan Li (Beijing Institute of Technology, China)

0

There have been many studies leveraging bargaining to incentivize the sharing of network resources between resource owners and seekers. They predicted bargaining behavior and outcomes mainly by assuming that bargainers are fully rational and possess sufficient knowledge about their opponents. Our work addresses the prediction of bargaining behavior in network resource sharing scenarios where these assumptions do not hold, i.e., bargainers are bounded-rational and have heterogeneous knowledge. Our first key idea is using a multi-output Long Short-Term Memory (LSTM) neural network to learn bargainers' behavior patterns and predict both their discrete and continuous decisions. Our second key idea is assigning a unique latent vector to each bargainer, characterizing the heterogeneity among bargainers. We propose a scheme to jointly learn the LSTM weights and latent vectors from real bargaining data, and utilize them to achieve a personalized behavior prediction. We prove that estimating our LSTM weights corresponds to a special design of LSTM training, and also theoretically characterize the performance of our scheme. To deal with large-scale datasets in practice, we further propose a variant of our scheme to accelerate the LSTM training. Experiments on a large real-world bargaining dataset demonstrate that our schemes achieve more accurate personalized predictions than baselines.

Speaker Haoran Yu (Beijing Institute of Technology)

Haoran Yu received the Ph.D. degree from the Department of Information Engineering, the Chinese University of Hong Kong in 2016. From 2015 to 2016, he was a Visiting Student with the Yale Institute for Network Science and the Department of Electrical Engineering, Yale University. From 2018 to 2019, he was a Post-Doctoral Fellow with the Department of Electrical and Computer Engineering, Northwestern University. He is currently an Associate Professor with the School of Computer Science & Technology, Beijing Institute of Technology. His current research interests lie in the interdisciplinary area between game theory and artificial intelligence, with focuses on human strategic behavior prediction and private information inference. His past research is mainly about game theory in networks. His research work has been presented/published in top-tier conferences, including IEEE INFOCOM, ACM SIGMETRICS, ACM MobiHoc, IJCAI, AAAI, and journals, including IEEE/ACM TON, IEEE JSAC, and IEEE TMC.

PPGSpotter: Personalized Free Weight Training Monitoring Using Wearable PPG Sensor

Xiaochen Liu, Fan Li, Yetong Cao, Shengchun Zhai and Song Yang (Beijing Institute of Technology, China); Yu Wang (Temple University, USA)

0

Free weight training (FWT) is of utmost importance for physical well-being. However, the success of FWT depends on choosing the suitable workload, as improper selections can lead to suboptimal outcomes or injury. Current workload estimation approaches rely on manual recording and specialized equipment, with limited feedback. Therefore, we introduce PPGSpotter, a novel PPG-based system for FWT monitoring in a convenient, low-cost, and fine-grained manner. By characterizing the arterial geometry compressions caused by the deformation of distinct muscle groups during various exercises and workloads in PPG signals, PPGSpotter can infer essential FWT factors such as workload, repetitions, and exercise type. To remove pulse-related interference that heavily contaminates PPG signals, we develop an arterial interference elimination approach based on adaptive filtering, effectively extracting the pure motion-derived signal (MDS). Furthermore, we explore 2D representations within the phase space of MDS to extract spatiotemporal information, enabling PPGSpotter to address the challenge of resisting sensor shifts. Finally, we leverage a multi-task CNN-based model with workload adjustment guidance to achieve personalized FWT monitoring. Extensive experiments with 15 participants confirm that PPGSpotter can achieve workload estimation (0.59 kg RMSE), repetitions estimation (0.96 reps RMSE), and exercise type recognition (91.57% F1-score) while providing valid workload adjustment recommendations.

Speaker Xiaochen Liu (Beijing Institute of Technology, China)

Xiaochen Liu is now working toward the Ph.D. degree in the School of Computer Science at Beijing Institute of Technology, advised by Prof. Fan Li. She received her B.E. degree in Internet of Things from China University of Petroleum in 2020. Her research interests include Wearable Computing, Mobile Health, and the IoT.

Session Chair

Yuval Shavitt (Tel-Aviv University, Israel)

Enter Zoom

Program at a Glance